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Abstract. Research into quantum algorithms for NP-complete prob- 
lems has rekindled interest in the detailed study a broad class of com- 
binatorial problems. A recent paper [1] applied the quantum adiabatic 
evolution algorithm to the Exact Cover problem for 3-sets (EC3), and 
provided an empirical evidence that the algorithm was polynomial. In 
this paper we provide a detailed study of the characteristics of the ex- 
act cover problem. We present the annealing approximation applied to 
EC3, which gives an over-estimate of the phase transition point. We also 
identify empirically the phase transition point. We also study the com- 
plexity of two classical algorithms on this problem: Davis-Putnam and 
Simulated Annealing. For these algorithms, EC3 is significantly easier 
than 3- SAT. 


1 Introduction 

In the early 1990s it was realized that many problems in the class of NP-complete 
problems [2] exhibit phase transitions, and that the peak in the complexity of 
practical algorithms occurs at the phase transition [3-5]. The example used in 
most of these studies is the 3-sat problem, and extensive studies of the behaviour 
of 3-sat have been performed [6, 7]. Recently, however, there has been an upsurge 
in interest in other problems in the NP-complete class. This has been due to 
the rise in interest in quantum computation. Three problems that have been 
approached using the Quantum Adiabatic Evolution Algorithm (QAA) [8] are, 
the exact cover problem for 3-sets (EC3) [1], the set partition problem [9], and 
3-sat [10]. In this paper we present the annealing approximation analysis of 
EC3, and the behaviour of two classical algorithms (the Davis-Putnam (DP) 
algorithm [11] and Simulated Annealing (SA) ) applied to it. We present results 
that identify strongly the region of the phase transition. 

Because DP has a very small factor in the exponent (see section 3.4), studying 
the asymptotic scaling of the complexity requires N 20, the maximum value 
used in [1]. However, because the inner workings of QAA are much closer to 
SA than to DP [9, 12], we also study the complexity of SA, to try to determine 
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at what value of N the asymptotic regime is achieved. The complexity of SA 
scales with problem size 4 as exp(0.023 x N). This does not give much support 
to determining asymptotic exponential complexity from the empiricardata that 
correspond to N < 20. 

An N-bit instance of EC3 is built up from clauses, each of which is a constraint 
imposed on the values of three of the bits. If a given clause involves the three bits 
labeled i, j and k, then the constraint is satisfied if Z{ + zj -f z k = 1, {z% € {0, 1}) 
and is violated otherwise. There are 8 possible assignments for the values of Zi, 

and z k ) but only 3 out of the 8 satisfy the constraint. It is clear that satisfying 
assignments have one of three bits with value 1, and the remaining two with value 
0. An N-bit instance of EC3 is a list of triples (i,j, k) (with distinct indexes in 
each triple) indicating which groups of three bits are involved in the clauses. 
The decision problem is to determine whether or not there is some assignment 
of the N bit values that satisfies all of the clauses. In general, there may not 
be any assignment that satisfy all the clauses and an optimization variant of 
EC3 is to find an assignment that violates the least number of clauses (in what 
following we shall refer to both variants of the problem simply as EC3, instead 
of using more conventional notation MAX-EC3 for optimization variant of this 
NP-complete problem). 


2 Annealed Approximation Theory 

The cost function E x in EC3 is defined on a set of 2 N possible N-bit strings 
z = {z\) z 2 , . . . , zn}- It is equal to the number of constraints that are violated 
by a given assignment z. We introduce an M x N matrix A with M rows A m , 
one for each clause, and N columns, one for each bit in the N - bit string. Each 
row has three elements equal to 1, with the rest equal to 0. The elements of the 
m-th row A m j — 1 occur at distinct positions l — i,j,k that coincide with the 
triple of indices in the corresponding clause. Then a given clause A m is satisfied 
by an assignment z iff A m • z = 1, and the cost function can be written in the 
following form: 


N N 

E z = M — ^ ^ A (A m * z, 1) , A m * z = ^ ^ A m j zi . (1) 

m=l 1 

Here M is the total number of clauses and A(k,l) is a Kronecker delta. The 
solution of EC3 is an assignment with minimal E z . In what follows we shall 
refer to a given instance of EC3 simply as A . 

Consider a set X of all instances of EC3 with N bits and M distinct clauses. 

Due to the fact that the size of X can be very large (equal to ( ^- ax j , where 

4 Note that in a random problem instance, some of the bits are not included in any 
clauses. This differs from how the problem instances were generated in [1] and results 
in a smaller effective N (by a factor of ss 1.18) 
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) it is convenient to introduce the statistical distribution of in- 


stances within 1 and sample from this distribution while studying the instance- 
dependent quantities such as algorithm complexity, number of solutions, etc. 
In this uniform sampling model with statistically independent clauses, one ran- 
domly chooses a triple of distinct indexes (i,j, k) from the interval (l,n) and 
independently repeats this procedure M times - once for each row of A . 


2.1 Energy Distribution Function and Phase Transition in EC3 in 
the Annealing Approximation 

The simplest energy distribution characterizing a given instance of EC3 is de- 
scribed by the following function: 

P(E-,A) = 2~ n A l E *’ E l ( 2 ) 

z£{0,1} n 

Here P(E; A) is the fraction of all possible assignments of z that violate exactly 
E clauses (E = 0, 1, . . . , M) of an instance A. Qualitatively the behavior of the 
function P(E; A) can be analyzed by averaging it over the problem instances: 
P(E) = (P(E;A))a‘ To implement this we introduce the probability p z that a 
single randomly generated clause is satisfied by a given assignment of z: 

p z = b(h( z)), h(z) = J2zj, b(Q) = Q (^ . (3) 

Note that this probability in EC3 depends only on the Hamming weight of 
a string h( z) (i.e. the number of unit bits in it). With the assumption that 
clauses are independent, the ensemble-averaged distribution P(E) = {P{E\ A)) a 
is given by a sum of binomial distributions (each corresponding to a certain value 
of the Hamming weight Q) 

P(E) = 2~ n E (g) P(E,Q), P(E,Q ) = (f y M ~ E (Q) (1 - b(Q)f. 

.(4) 

(E = 0, 1, . . . , M). Direct averaging of quantities like (2) over the ensemble of A 
is usually called in spin-glass theory an 55 annealing” approximation [13]. 

For a given problem instance the number of solutions always has a multiplica- 
tive factor equal to 2 a N where a is the fraction of bits that are not included in 
any clauses. The average value of a is 

- _ {N-1\ M ( n\~ m 
a -{ 3 ) ( 3 ) ’ 

z x M x 

a exp (— 37 ) 7 = — (N 00 ), 


( 5 ) 
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and it decreases exponentially with the ratio 7 = M/N. 

Consider now the average number of bit configurations that violate exactly 
~~E clauses, 2 N ~PJE), normalized by "themuTtiplicative factor 2“^ 

r{E) = 2 n ~ &n P{E). (6) 

For small E values E M 1 / 2 and in a certain range of 7 the function f(E) 
takes the following form: 

f(E) « ^ [% (C - l)f r N(7_%) . 17 - 7c| « 1- (7) 

Here 7 — M/N, the constants % « 0.703, and £ & 2.29. Eq. (7) describes the 
exponential growth of the number of assignments with E at small E. 

We will compare this annealing approximation to the critical value {% « 
0.603) with the experimental value to be determined later (section 3.3). 

We note that if one fixes the value 7 < in (7) and considers the asymp- 
totic limit N — > 00 then the ensemble-average fraction of normalized number of 
solutions f(0) will grow exponentially with N. However if 7 > j c then P(0) is 
exponentially decreasing with N. Therefore Eq. (7) describes a phase transition 
in the number of satisfying assignments, E z = 0, between the “over-constrained” 
(7 > 7c) and under-constrained (7 < %) regimes of EC3 as a control parameter 
7 = M/N varies through the critical region 7 « j c . 

The average minimum energy E m i n = 0 for 7 < 7 C , and for 7 > 7 C it satisfies 
the condition P(E m i n ) = 1. From this one can obtain: 

£min = N ( 7 - 7c) log C/ log ( (7 _ 7 J ^ g C -) ’ i'T > 7c)- (») 

Therefore in the over-constrained range 7 > 7 c the minimum energy grows 
linearly with N (see section 3.6 for a comparison with the experimental results). 
Comparison between the analytical result P{E) (4) and simulations of the exact 
function P{E\ A) computed for randomly generated problem instances is shown 
in figure L We note that more detailed analysis based on the method of replicas 
[13] reveals the deviation of P(E; A) from the annealing approximation based 
on Eq.(4). 

Because the probability for a randomly-generated clause to be satisfied by 
a given assignment depends only on the Hamming weight of this assignment it 
is of interest to consider the average number of bits in a satisfying assignment 
(the solution). By taking the sum in the (4) over Q with the method of steepest 
descent, the dominant contribution comes from the small vicinity of the value 
Q = Q * , which is a stationary point of the integrand in (4). After setting E — 0 
we obtain the stationary point equation in the following form 

1 — t* O* M 

x*(l - X *) log - 27 X* + 7 = 0, X* = 7 = (9) 

Its solution for the critical value of 7 = 7 C is x * ~ 0.3808. . . that gives the 
fraction of unit bits in the solution string. Experimentally, the fraction was 0.38. 
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Fig. 1 . Logarithmic plot of the distribution of the cost values in EC3: rectangles corre- 
sponds to numerical simulations of P{E\ A) vs E based on (2); solid curve corresponds 
to analytical expression (4) P(E) vs E in the annealing approximation. 


We note that a true critical value for the ratio M/N at the phase transition 
point is different from that given in annealing approximation because the disor- 
der effect is significant on the tail of P(E ; ^4) (for E — ► 0) that determines that 
ratio. Detailed numerical study of the phase transition point is given below. He 
we note in passing that near the maximum of P(E; A) this function is well ap- 
proximated by the approximate result (4). In particular it has a Gaussian form 
in this range of E , with the mean value and variance given below: 

(E)=E mliX = ^N, ( E-(E)) 2 =a 2 (0)N , cr 2 (0) « ^(3 7 2 +5 7 ) + 0(1). 

(10) 

Here the expression for E max determines the value of cost that occurs most 
often if the assignments z are sampled at random. This value is independent 
of particular instances of EC3. The normalized standard deviation cr(0) is a 
self- averaging quantity given in (10). In the limit N oc the first and second 
moment correctly describe the form of the distribution near the maximum | E — 
Emax | N , which is not sufficient however for the analysis of the local search 
algorithms such as simulated annealing that spend most of the time for small 
values of E ~ 1. 


3 Experimental Results 


In this section we present experimental results on random EC3 instances. Using 
the Davis-Putnam (DP) algorithm and Simulated Annealing (SA), we study the 
crossover point, the computation complexity and the mean minimum energy of 
solution. We also identify experimentally the position of the phase transition. 
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Table 1 . Outline of the Davis-Putnam Algorithm 


Find_Model( theory ) 

unit_propagate( theory ); 

if contradiction discovered return(f alse) ; 
else if all variables are valued return(true) ; 
else { 

x = some unvalued variable; 

return ( Find_Model( theory AND x ) OR 

Find_Model( theory AND NOT x ) ); 

> 


3.1 The Davis-Putnam Algorithm 

The Davis-Putnam (DP) algorithm [11], or a variation, is regarded as the most 
efficient complete algorithm for satisfiability problems. An outline of the DP 
algorithm is given in table 1 [6]. The version we used varies from this outline 
in one major respect. We perform a sort of the variables before the first call to 
Find_Model, sorting on the number of clauses which use the variable. This was 
found to produce, on average, a very large speed-up in the algorithm’s execution. 

The unit_propagate step of the algorithm is also extremely efficient for 
the exact cover problem. Once one variable in a clause is set to 1, the value of 
the other two variables is fixed, and extensive propagation often occurs. Also, 
because a single variable in a clause being set to 1 determines the other two 
variables in the clause, we call Find_Model( theory AND x ) first. 


3.2 The Formation of a Super Cluster 

A basic characteristic of problems that exhibit a phase transition is the formation 
of a super-cluster [14]. If the constraints are used to cluster the variables into 
groups that are (indirectly) connected, then once the number of constraints 
reaches a critical density, a “super-cluster” forms that includes the vast majority 
of the variables that are involved in the clauses. In Figure 2 we plot the ratio 
of the size of the second largest cluster to the size of the largest cluster, for 
increasing N. The phase transition is clearly seen. 


3.3 The Crossover Point 

The major feature of a phase transition in a satisfiability problem is the presence 
of a threshold in 7, below which almost all random problem instances are solv- 
able, and above which almost no random problem instances are. Figure 3 shows 
a plot of the proportion of random problem instances that have a satisfying 
assignment, versus 7, for various values of N . The proportions are based on run- 
ning the DP algorithm on 50,000 random problem instances for each value of N 
and 7. The expected features are present. The sharpness of the phase transition 
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increases with N, and the point at which the curve crosses the line where the 
proportion of instances with a satisfying assignment equals 0.5 decreases with 
TV . ~ _ “ ' 

Experimentally the crossover point is at j c w 0.625. Note that this is some- 
what lower than j c = 0.703 given by the annealing approximation, showing that 
the tails of P(E ; A) are important. In figure 4 (lower curve) we plot the value 
of 7 for which 50% of the problem instances were satisfiable as a function of the 
number of bits. The curve appears to have an asymptote around « 0.625. 



Fig. 4, Top curve: plot of the maximum complexity of the DP algorithm. Lower curve: 
the position of the crossing point of proportion with satisfying assignment = 0.5 


3.4 Complexity of the Davis-Putnam Algorithm 

Figure 5 shows plots of the median complexity of the Davis-Putnam (DP) algo- 
rithm, where complexity is defined as the number of calls to Find_Model (Table 
I). The median was taken over 50,000 random problem instances. As expected, 
because the DP algorithm is complete, its performance scales exponentially with 
problem size, N. Note also that the value of 7 for which the maximum complex- 
ity occurs is above 7 C , and slowly reduces as N increases. In figure 4 (upper 
curve) we plot the position of the maximum complexity and its uncertainty, we 
note that for the range of values of N considered, it does not appear to have 
converged to an asymptote, but the curve does not appear to contradict our 
earlier result of 7 C « 0.625. 

Fitting an exponential law to the peak complexity gives C = 6.13 exp(0.0067 x 
A), a very slow rate of increase - an order of magnitude slower than reported 
results on the complexity of DP applied to 3 -SAT [ 6 ]. 
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Fig. 5. Computational complexity of DP (all problem instances) 


3.5 Complexity of Simulated Annealing 


In order to quantify the complexity of Simulated Annealing (SA), it is neces- 
sary to be able to determine when the minimum energy configuration has been 
reached. In order to be able to do this, when a random problem instance was 
generated, we first used DP to determine if it had a zero-cost solution. If so, then 
SA was applied; if not, the problem instance was rejected and a new random 
problem instance was generated, etc. 

We used an exponential cooling schedule, T = Toa k , where an iteration k 
is one attempted flip of all N bits. For each problem instance, we set Tq — 2, 
ran a range of values of a, and ran the SA algorithm 100 times. We define 
complexity as follows. For those runs that found the zero-cost solution, compute 
the mean number of iterations 5 . The complexity is defined as this mean number 
of iterations, divided by the proportion of the 100 runs that found a zero-cost 
solution. In figure 6 we plot the minimum complexity over the values of the 
cooling schedule, a. The flattening off of the curves for low values of 7 is due to 
the lowest values of a used being too high in this region. 

Note that the scaling behaviour is very similar to that for the DP algo- 
rithm, but that the rate of increase of complexity with N is much larger (C oc 
exp(0.023 x N)). 
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3.6 Scaling of the Minimum Energy Close to the Phase Transition 

We also used SA to obtain the minimum energy for random EC3 instances close 
to the phase transition. Figure 7 plots the mean (over 800 instances) minimum 
energy versus N for values of 7 near the phase transition. These graphs quali- 
tatively support the scaling behaviour given by the annealing approximation in 
section 2 (see eq. (8)). 

4 Conclusions 

We have presented theoretical and experimental results on the algorithmic com- 
plexity and position of the phase transition for the EC3 problem. The annealing 
approximation gives % = 0.703 and experiments suggest 7 C « 0.625. Unpub- 
lished results [15] give analytic bounds of 0.517 < 7 C < 0.727 and are in good 
agreement with our theoretical and experimental values. 

In passing we want to note that a recent paper [10] considered simulations of 
Q AA for the 3SAT problem and its comparison with the results of classical search 
algorithms, including GSAT, at the phase transition point. The exponential fit 
to GSAT was exp( 0.13 x N) which was about the same as for QAA for that 
problem. Recall that SA for EC3 has complexity exp(0.023 x N). We studied 
the complexity of the simplest local search method for global optimization, SA, 
because the dynamics of this algorithm can be viewed as the closest classical 
counterpart to QAA, and may cast light on the difficulty of solving EC3 using 
QAA. 
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